Multilingual Single-Document Summarization with MUSE

نویسندگان

  • Marina Litvak
  • Mark Last
چکیده

MUltilingual Sentence Extractor (MUSE) is aimed at multilingual single-document summarization. MUSE implements a supervised language-independent summarization approach based on optimization of multiple sentence ranking methods using a Genetic Algorithm. The main advantage of MUSE is its language-independency – it is using statistical sentence features, which can be calculated for sentences in any language. In our previous work, the performance of MUSE was found to be significantly better than the best known state-of-the-art extractive summarization approaches and tools in three different languages: English, Hebrew, and Arabic. Moreover, our experimental results in the cross-lingual domain suggest that MUSE does not need to be retrained on a summarization corpus in each new language, and the same weighting model can be used across several languages (Last and Litvak, 2012). MUSE participated in the MultiLing 2013 single document summarization task on three languages: English, Hebrew and Arabic. Due to a very limited time that was given to the participants to run their systems on the MultiLing 2013 data, the results submitted to evaluation were obtained by summarizing the documents using models pre-trained on different corpora. As such, no training has been performed on the MultiLing 2013 corpus. 1 MUltilingual Sentence Extractor (MUSE): Overview

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MUSE – A Multilingual Sentence Extractor

MUltilingual Sentence Extractor (MUSE) is aimed at multilingual single-document summarization. MUSE implements the supervised language-independent summarization approach based on optimization of multiple statistical sentence ranking methods. The MUSE tool consists of two main modules: the training module activated in the offline mode, and the on-line summarization module. The training module ca...

متن کامل

A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm

Automated summarization methods can be defined as “language-independent,” if they are not based on any languagespecific knowledge. Such methods can be used for multilingual summarization defined by Mani (2001) as “processing several languages, with summary in the same language as input.” In this paper, we introduce MUSE, a languageindependent approach for extractive summarization based on the l...

متن کامل

MUSEEC: A Multilingual Text Summarization Tool

The MUSEEC (MUltilingual SEntence Extraction and Compression) summarization tool implements several extractive summarization techniques – at the level of complete and compressed sentences – that can be applied, with some minor adaptations, to documents in multiple languages. The current version of MUSEEC provides the following summarization methods: (1) MUSE – a supervised summarizer, based on ...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Searching and Summarizing in a Multilingual Environment

Multilingual aspects have been gaining more and more attention in recent years. This trend has been accentuated by the global integration of European states and the vanishing cultural and social boundaries. The ever increasing use of foreign languages is due to the information boom caused by the emergence of easy internet access. Multilingual text processing has become an important field bringi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013